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FOREWORD 


The  SURVEILLANCE  SYSTEMS  Work  Unit  within  the  Army  Research  Institute  (ARI)  has 
as  its  objective  the  production  of  scientific  data  bearing  on  the  extraction  of  information  from 
surveillance  displays  and  the  efficient  storage,  retrieval,  and  transmission  of  this  information 
within  an  advanced  computerized  image  interpretation  facility.  Research  results  are  used  in  future 
systems  design  and  in  the  development  of  enhanced  techniques  for  all  phases  of  the  interpretation 
process. 

ARI  research  in  this  area  is  conducted  as  an  in-house  research  effort  augmented  by  contracts 
with  organizations  selected  as  having  unique  capabilities  and  facilities  for  research  in  aerial 
surveillance.  The  entire  research  program  is  responsive  to  requirements  of  Army  RDT&E  Project 
2Q662704A721,  Surveillance  Systems,  FY  1973  Work  Program. 

The  ARI  Work  Unit,  Surveillance  Systems,  is  conducting  research  to  determine  how 
interpreter  performance  is  affected  by  variations  in  the  character  of  the  imaye.  A  primary  objective 
is  to  develop  an  instrument  for  use  in  evaluating  imagery  for  interpretability-an  image  quality 
catalog,  in  effect.  An  analysis  based  on  an  analogy  with  signal  detection  concepts  has  been 
reported  in  Technical  Research  Report  1178,  "Development  of  a  psychophysical  photo  quality 
measure." 

The  research  reported  here  was  accomplished  jointly  by  personnel  of  the  Stanford  Research 
Institute  and  by  the  Systems  Integration  and  Command/Control  Technical  Area,  Organization  and 
Systems  Research  Laboratory  of  the  U.  S.  Army  Research  Institute  for  the  Behavioral  and  Social 
Sciences.  The  Institute,  established  1  October  1972,  as  replacement  for  the  U.  S.  Army  Manpower 
Resources  Research  and  Development  Center,  unifies  in  one  enlarged  organization  all  OCRD 
activities  in  the  behavioral  and  social  sciences  area,  including  those  conducted  by  the  former 
Behavior  and  Systems  Research  Laboratory  (BESRL)  and  the  Motivation  and  Training  Laboratory 
(MTL).  The  present  publication  reports  on  a  special  analysis  of  the  data  collected  as  a  basis  for 
development  of  the  psychophysical  photo  quality  measure  and  identifies  atmospheric  haze  as  an 
additional  dimension  to  be  included  in  such  a  measure. 
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EFFECT  OF  PHOTO  DEGRADATIONS  ON  INTERPRETER  PERFORMANCE 


BRIEF 


Requirement: 

To  identify  photo  dimensions  frequently  responsible  for  quality  degradation  of  operationally 
obtained  aerial  reconnaissance  photographic  film  and  ;o  assess  their  effect  on  the  accuracy  and 
completeness  with  which  trained  image  interpreters  can  detect  and  identify  tactical  targets. 


Procedure: 

Factors  contributing  to  poor  quality  photo-mission  coverage  were  isolated  by  detailed 
examination  of  reconnaissance  photography  in  several  military  film  repositories.  Three 
dimensions-photo  scale  (four  levels),  haze  (three  levels),  and  blur  (four  levels)-were  selected  for 
experimental  manipulation.  Of  the  48  experimental  conditions  possible,  13  were  selected  for  the 
research.  Each  of  13  aerial  scenes  was  treated  photographically  to  produce  the  13  treatment 
conditions.  The  interpreter's  task  was  to  view  a  serially  numbered  set  of  circled  areas  on  each 
photographic  transparency  and  to  judge  whether  targets  were  or  were  not  contained  in  the  circled 
area  and  to  identify  all  targets.  Scores  on  accuracy  and  completeness  of  target  detection  and 
identification  were  computed  for  each  experimental  subject.  Means  and  standard  deviations  were 
obtained  for  every  treatment  condition.  Analysis  of  variance  was  used  to  determine  the  statistical 
significance  of  treatment  effects.  Duncan's  multiple  range  test  was  used  to  evaluate  the  statistical 
significance  among  the  various  means. 


Findings: 

When  variations  in  photo  scale,  haze,  and  blur  were  present  separately  in  photographic 
transparencies,  there  was  little  change  in  target  detection  performance.  When  two  or  more  of  these 
sources  of  degradation  were  present  simultaneously,  target  detection  deteriorated  markedly. 

Target  identification  accuracy  and  completeness  were  significantly  reduced  by  either 
unidimensional  or  multidimensional  degrading  conditions  of  the  type  included  in  the  investigation. 

When  photo  scale  was  small,  the  effect  of  other  sources  of  degradation  on  interpreter 
performance  was  significantly  greater  than  when  photo  scale  was  large. 

Degradation  of  overall  target  detection  accuracy  was  due  more  to  erroneous  classification  of 
non-targets  as  targets  than  to  classification  of  targets  as  non-targets. 
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Utilization  of  Findings: 


The  findings  of  this  technology  base  research  provide  direction  for  a  continuing  search  for 
improved  techniques  for  predicting  the  utility  of  aerial  reconnaissance  photographic  missions  and 
for  guiding  the  G2  Air  officer  in  establishing  mission  requirements. 

In  addition,  findings  will  be  useful  in  revision  of  BESRL's  photo  quality  catalog  from  which 
measures  of  the  interpretability  of  specific  imagery  are  derived.  Estimates  are  now  based  on 
comparison  with  catalog  images  varying  in  scale,  sharpness,  and  scene  complexity.  Results  here 
indicate  that  to  these  should  be  added  variations  in  atmospheric  haze  as  another  index  to  the 
amoum  0f  information  to  be  expected  from  interpretation  of  the  imagery. 
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EFFECT  OF  PHOTO  DEGRADATION  ON  INTERPRETER  PERFORMANCE 


BACKGROUND 

The  quality  of  the  aerial  photographs  from  which  the  image  inter¬ 
preter  must  extract  intelligence  information  contributes  importantly 
to  the  accuracy  and  completeness  of  his  target  detections  and  identifi¬ 
cations.  Although  personal  characteristics  of  the  interpreter  are 
important  in  determining  the  absolute  level  of  performance  in  a  given 
circumstance,  losses  due  to  individual  differences  are  variable  while 
those  due  to  photo  degradations  tend  to  be  more  generalized,  resulting 
in  some  performance  loss  for  all  interpreters.  To  predict  the  mean 
target  detection  and  identification  accuracy  associated  with  photographs 
degraded  by  specified  amounts,  BESRL  developed  a  photo  quality  catalog*^ 
This  catalog  contains  231  photo  transparencies  which  vary  in  scale, 
image  sharpness,  and  scene  complexity.  Scene  complexity  refers  to  the 
amount  of  confusion  introduced  in  the  interpretation  task  by  scene 
background.  A  low  complexity  scene  would  be  characterized  by  flat  open 
terrain  where  the  target  objects  are  readily  distinguishable  from  the 
background.  A  scene  of  high  complexity  would  contain  many  natural 
features  such  as  rocks  and  vegetation  that  would  make  it  exceedingly 
difficult  to  separate  target  from  background.  In  estimating  the 
interpretability  of  a  photograph,  the  interpreter  compares  the  photo¬ 
graph  with  the  catalog  transparencies.  He  finds  the  catalog  image  that 
he  judges  to  match  the  photograph  most  closely  in  scale,  sharpness,  and 
scene  complexity.  The  number  of  the  catalog  image  is  then  used  to  enter 
the  table  provided  with  the  catalog  to  obtain  the  predicted  level  of 
accuracy  in  target  detection  and  target  identification. 

In  subsequent  research,  BESRL  sought  to  identify  other  photo 
dimensions  that  should  be  incorporated  in  the  photo  quality  catalog.  A 
search  of  operational  reconnaissance  film  repositories  led  the  investi¬ 
gators  to  conclude  that  photo  scale,  haze  due  to  atmospheric  attenuation, 
and  blur  resulting  from  camera  movement  or  faulty  image  motion  compensa¬ 
tion  were  the  most  common  causes  of  photographic  degradation.  Excellent 
quality  photographic  materials  containing  tactical  targets  were  selected 
and  degraded  by  darkroom  procedures  to  produce  photographs  that  repre¬ 
sented  the  range  of  the  experimental  variables  selected  for  the  research. 
Four  levels  of  photo  scale,  three  levels  of  haze,  and  four  levels  of 
blur  were  specified  for  the  experiment.  Only  13  of  the  possible  48 
treatment  combinations  were  selected  for  the  experiment  because  of  the 
expense  involved  in  treating  imagery  and  testing  large  numbers  of 
experimental  subjects.  Data  were  collected  from  trained  image  interpreters 
assigned  to  the  15th  Military  Intelligence  Battalion,  Aerial  Reconnais¬ 
sance  and  Support,  located  at  Fort  Bragg,  North  Carolina.  These  data 


^Brairard,  R.  W. ,  L,  J,  Lopez,  G,  N,  Ornstein,  and  R.  Sadacca.  Develop¬ 
ment  and  evaluation  of  a  catalog  technique  for  measuring  image  quality. 
Behavior  and  Systems  Research  Laboratory  Technical  Research  Report  1150. 
Arlington,  Virginia.  August,  1966, 


were  reduced  and  analyzed  using  techniques  and  procedures  developed  in 
signal  detection  theory.  A  simple  model  to  predict  target  detection 
performance  from  the  ground  resolved  distance  (GRD)  estimate  for  the 
photograph  was  developed.3-'  The  degree  of  agreement  between  predicted 
performance  and  empirical  observation  was  examined  using  rank-difference 
correlation. 


PURPOSE 

The  present  purpose  was  to  re-analyze  data  from  the  development  of 
the  psychophysical  measure  using  different  statistical  procedures.  The 
present  treatment  yields  results  which  can  be  stated  in  terms  consistent 
with  those  of  other  reports  of  research  conducted  in  ARl's  surveillance 
research  program.  A  reader  familiar  with  the  research  literature  of 
aerial  surveillance  but  unacquainted  with  the  terms  used  in  signal  de¬ 
tection  theory — receiver-operating-characteristics  or  ROC  curves,  for 
example — may  find  the  present  conceptualization  more  in  keeping  with 
specific  interest  in  image  interpretation  than  that  presented  in  the 
earlier  report  on  this  research.  Specific  objectives  of  the  present 
analysis  were: 

1.  To  determine  the  mean  detection  accuracy  for  the  13  treatment 
conditions.  (Detection  accuracy  and  completeness  are  equivalent  Indexes 
when  the  subjects  are  required  to  respond  to  a  fixed  set  of  annotated 
locations  on  the  imagery.) 

2.  To  determine  mean  target  Identification  accuracy  for  each  of  the 
13  treatment  conditions. 

3.  To  determine  the  mean  target  identification  completeness  for 
the  13  treatment  conditions. 

A.  To  determine  separately  for  target  and  non-target  annotations  the 
mean  detection  accuracy/completeness. 


METHOD 


Experimental  Design 


Four  levels  of  photo  scale,  three  levels  of  haze,  and  four  levels  of 
image  motion  were  established  for  the  three  independent  measures  of  the 
experiment.  Figure  1  shows  a  schematic  representation  of  the  research  de¬ 
sign.  The  total  number  of  treatment  conditions  possible  are  48  but  because 
of  the  amount  of  work  and  expense  involved  in  preparing  imagery  for  all 
possible  conditions  only  13  were  selected  for  the  actual  experiment.  The 


il/Clarke,  F.  R.,  R.  L.  Welch,  and  T.  E.  Jeffrey.  Development  of  a  psycho 
physical  photo  quality  measure.  Army  Research  Institute  Technical 
Research  Report  H7B.  Arlington,  Virginia.  _  1Q713. 
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13  experimental  conditions  chosen  are  Indicated  in  Figure  1  by  the 
crosshatched  cells  and  Table  1  gives  the  level  of  scale,  haze,  and  image 
motion  for  each. 
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Figure  1.  Schematic  of  Experimental  Design 


Table  1 


EXPERIMENTAL  CONDITION  CODES  AND  INDEPENDENT  VARIABLE  LEVELS 


Experimental  Approximate® 

Condition  Scale 

Number  Code 

_ _ _ i _ 

Haze 

Image 

Motion 

1-1-1  1:2,000 

None 

None 

2 

2-1-1  '  1:A,000 

None 

None 

3 

3-1-1  I  1:8,000 

i 

None 

None 

A 

A-l-1  j  1:12,000 

None 

None 

5 

1-2-1  l  1:2,000 

1:A  Contrast 

None 

6 

1-3-1  :  1:2,000 

j  w 

1:2  Contrast 

None 

7 

1-1-2  '  1:2,000 

: 

None 

,025mm 

8 

1-1-3  i  1:2,000 

j 

None 

,050mm 

9 

1-1-A  j  1:2,000 

None 

,100mm 

10 

1-3- A  j  1:2,000 

1:2  Contrast 

•100mm 

11 

A-l-A  1:12,000 

None 

. 100mm 

12 

A-3-1  1:12,000 

1:2  Contrast 

None 

13 

A-3-A  1:12,000 

1:2  Contrast 

,100mm 

8Sm  Table  A-1  in  Appendix  A  for  actual  scale  values. 


Development  of  Experimental  Imagery 

Fourteen  large-scale,  good-quality,  negative  transparencies  were 
selected  as  the  basic  photographic  imagery  from  which  the  experimental 
stimuli  were  to  be  prepared.  Each  9  x  9-inch  transparency  depicted  a 
unique  scene.  Thirteen  of  these  scenes  were  for  the  collection  of 
response  data;  the  fourteenth  was  used  as  a  practice  scene  to  acquaint 
the  subjects  with  the  experimental  task. 

Target  and  non-target  objects  or  areas  were  circled  (annotated)  on 
each  of  the  1A  original  negatives.  The  per  scene  average  was  about  15 
annotations— 8  containing  targets  and  7  without  targets.  Since  the 


-  A  - 


target  annotations  could  contain  multiple  targets,  the  average  number  of 
targets  per  scene  exceeded  the  mean  number  of  target  annotations.  There 
was  an  average  of  about  15  targets  per  scene  for  the  13  experimental 
images. 

By  photographic  techniques  the  photo  scale,  haze,  and  blur  were 
varied  separately  and  in  combination  to  produce  positive  transparencies 
for  each  of  the  13  treatment  conditions  for  every  scene.  Photo  scale 
was  varied  by  standard  photo  reduction  methods.  The  haze  effect  was 
obtained  bv  fogging  the  film  using  a  beam  splitter.  Image  movement 
effect  was  produced  by  moving  the  film  easel  at  a  controlled  rate  during 
exposure  of  the  film.  The  practice  image  was  reproduced  at  1:9,600 
scale  and  was  without  haze  or  blur.  Ten  annotations  were  present  on 
this  practice  image. 

The  complete  set  of  imagery  consisted  of  169  unique  images — 13 
scenes  with  each  at  13  treatment  conditions.  Multiple  copies  of  the 
practice  image  were  produced.  Three  complete  sets  of  the  experimental 
imagery  were  prepared  so  that  separate  packages  of  stimulus  materials 
could  be  made  up  in  which  scene  and  treatment  conditions  were  varied. 
Each  Envelope  contained  on 2  image  of  each  of  the  13  scenes,  each  scene 
produced  under  a  unique  treatment  condition.  The  practice  image  was 
contained  in  a  small  envelope,  and  each  of  the  stimulus  images  was 
numbered  to  permit  ready  identification. 


Sample 


Image  interpreters  assigned  to  the  15th  Military  Intelligence 
Battalion,  Aerial  Reconnaissance  and  Support  located  at  Fort  Bragg, 
North  Carolina  served  as  the  experimental  subjects.  The  men  partici¬ 
pating  were  mostly  recent  graduates  of  the  Image  Interpretation  Course 
conducted  at  the  U.  S.  Army  Intelligence  School  then  located  at  Fort 
Folabird,  Maryland.  Records  from  26  of  the  48  men  tested  were  used  in 
the  present  analysis. 


Data  Collection 

Men  were  tested  in  groups  of  13.  Each  man  was  provided  a  light 
table,  7-power  tube  magnifier,  pencils,  response  booklet,  and  an  envelope 
containing  the  experimental  imagery.  A  target  list  like  that  appearing 
in  Table  2  completed  the  number  of  items  furnished. 

The  experimenter  instructed  the  group  to  fill  out  the  biographical 
data  requested  on  the  cover  sheet  of  the  response  booklet,  and,  after 
all  had  completed  this  step,  asked  them  to  take  out  the  practice  image 
and  place  it  on  the  light  table.  In  a  step-by-step  sequence,  the  subjects 
were  instructed  in  the  procedure  they  were  to  follow  in  examining  each 
annotation  and  in  writing  their  responses  in  the  answer  booklet.  After 
completion  of  the  practice  image  and  the  resolution  of  all  questions 
posed  by  the  interpreters  concerning  the  task,  the  experimenter  pro¬ 
ceeded  with  the  administration  of  the  experimental  task.  Rest  breaks 
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Table  2 


TARGET  LIST 


Targets 


Nontargets 


Vehicles  (Utility,  Cotrano,  etc.) 

Truck,  2-1/2  ton 
Truck,  3/4  ton 
Truck,  1/4  ton 
Semitrailer,  tank,  gasoline 
Tow  truck 

Tractor  and  semitrailer  (van) 

Armor 

Tank 

APC 

Trailer 

1-1/2  ton  and  tank 
3/4  ton 
1/4  ton 


Bushes,  trees,  logs,  etc. 

Old  building  foundations 
Aircraft  shadows 
Vehicle  tracks 
Crates,  boxes 
Farm  buildings 
Farm  vehicles 

Civilian  vehicles  on  highways 
Livestock 


Guns 

Howitzer  (self-propelled) 
Howitzer  (towed) 


Tents 

Large,  CP 
Medium,  CP,  squad 
Small,  pup 

Canvas,  shelter,  ammo 

Latrines 

Shower  points 

Foxholes,  one  and  two  man 

Weapons  pits 

Helicopters,  utility 

Personnel 

Semipermanent  and  permanent 

buildings  of  military  design, 
such  as  quonset,  butler,  etc. 
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were  given  between  the  administration  of  successive  images.  Response 
booklets  and  experimental  Imagery  were  collected  after  completion  of 
the  final  image.  Four  separate  groups  of  13  subjects  each  completed 
the  task.  Table  A-2  shows  the  order  of  treatment  conditions  for  the 
26  men  whose  responses  were  analyzed  for  the  present  purpose. 


Dependent  Variables 

The  dependent  variables  included  measures  of  target  detection  and 
target  identification.  The  correctness  with  which  the  subject  detected 
the  presence  or  absence  of  targets  in  the  annotations  was  used  to  derive 
indexes  of  detection  performance.  Figure  2  is  a  schematic  presentation 
of  the  categories  into  which  the  responses  of  the  subject  were  classified. 


Target 


TRUTH 


Non-Target 


Target 

RESPONSE 

Non-Target 

None 

llPiP 

U 

*2 

Wm 

u 

*5 

WA 

Figure  2.  Response  categories 


The  ratio  measures,  one  for  accuracy  and  one  for  completeness,  were 
derived  as  measures  of  detection  performance: 

Detection  accuracy.  Number  of  annotations  correctly  classified, 
expressed  as  a  ratio  of  all  responses  made  by  the  subject: 


Detection  accuracy  ■  (f,  +  f_) 

_ Li _ 2_ 


(fl  +  f5}  +  (f2  +  V 

Detection  completeness .  Number  of  annotations  correctly  classified, 
expressed  as  a  ratio  of  the  total  number  of  annotations  in  the  imagery. 


Detection  completeness 


(fi  +  V 


(fx  +  f5)  +  (fg  +  f4)  +  (fj  +  fg) 
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All  subjects  were  required  to  respond  to  all  annotations;  therefore, 
fj  m  ffi  m  0,  as  the  value  of  the  frequencies  in  the  shaded  cells  of 
Figure  2.  Equations  (1)  and  (2)  are  thereby  reduced  to  the  same  ex¬ 
pression  and  the  results  for  detection  performance  are  reported  by  a 
single  index. 

Target  identification  performance  was  determined  by  the  subject's 
ability  to  name  the  targets  in  annotations  that  he  correctly  classified 
as  being  target  annotations.  In  Figure  2,  these  are  in  the  cell  labeled 
fj_.  For  target  annotations  properly  classified  as  target  annotations, 
the  subject  could  identify  the  targets  correctly  (R) ,  or  he  could  make 
the  following  errors:  misidentify  the  target  (Wm) ,  fail  to  identify 
a  target  and  thus  omit  reporting  an  identification  (0),  or  give  a 
target  identification  for  a  non-tarpet,  thereby  inventing  a  target  (W^). 
Any  targets  reported  by  the  subject  for  annotations  classified  in  the 
cell  labeled  f^  had  to  be  of  the  inventive  type  of  wrong  response. 

Targets  actually  present  in  target  annotations  erroneously  classified 
by  the  subject  as  non-target  annotations  and  falling  in  the  cell  labeled 
f2  in  Figure  2  were  scored  as  omissions.  The  two  indexes  for  target 
identification  performance  were: 

Target  identification  accuracy.  Number  of  correct  identifications, 
expressed  as  a  ratio  of  the  total  number  of  target  identifications  re¬ 
ported. 


Target  identification  accuracy-  _ R _ 

R  +  Wn  +  Wi 

Target  identification  completeness.  Number  of  correct  identifica¬ 
tions,  expressed  as  a  ratio  of  the  total  number  of  targets  present  in 
the  imagery. 

Target  identification  completeness1*  _ R _ 

R  +  Wm  +  0 


Statistical  Computations 

The  basic  data  required  to  obtain  the  frequencies  indicated  in 
Figure  2  were  obtained  by  scoring  the  response  booklets  of  the  26 
interpreter  subjects.  Tbe  correctness  of  their  target  identifications, 
the  number  of  misidentifications,  inventions,  and  omissions  were  deter¬ 
mined.  For  each  of  the  26  subjects,  the  accuracy  cf  detection  and  the 
accuracy  and  completeness  of  target  identification  were  computed.  Tables 
B-l ,  B-A,  and  B-7  list  the  values  for  these  indexes  of  performance  for 
each  interpreter  subject  for  each  of  the  11  treatment  conditions.  Tables 
B-2,  B-5,  and  B-8  list  the  same  indexes  of  performance  for  each  inter¬ 
preter  for  each  of  the  unique  image  scenes.  Means  and  standard  devi¬ 
ations  for  the  three  indexes  of  performance  were  computed  and  are  given 
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in  Table  3.  Analysis  of  variance  was  used  to  test  the  statistical 
significance  of  treatment  effects,  and  Duncan's  multiple  range  test^ 
was  used  to  test  the  difference  between  mean  performance  among  the 
treatment  conditions. 


RESULTS  AND  DISCUSSION 


Detection  Accuracy 

For  the  best  circumstance,  the  average  interpreter  correctly 
classified  target  and  non-target  annotations  in  about  80  percent  of  the 
cases.  The  detection  accuracy  column  of  Table  3  shows  that  for  treat¬ 
ment  condition  (1-1-1)— describing  imagery  of  the  largest  scale  with  no 
degradation  due  to  atmospheric  attenuation  or  image  movement — detection 
accuracy  was  .80.  The  poorest  detection  accuracy  occurred  with  treatment 
condition  (4-3-1) — imagery  of  the  smallest  scale,  maximum  haze,  but  no 
image  movement.  Here,  detection  accuracy  was  .62  and  indicates  that 
the  average  interpreter  classified  the  annotations  correctly  about  62 
percent  of  the  time. 

A  very  natural  question  arises  concerning  the  statistical  signifi¬ 
cance  of  such  differences.  Does  the  mean  performance  of  these  26 
interpreters  vary  significantly  as  a  result  of  the  treatment  conditions 
used?  To  answer  this  logical  question,  the  variance  of  the  treatment 
by  subject  score  matrix  appearing  at  Table  B-l  was  analyzed.  There  were 
no  true  replications  across  subjects  for  these  data.  In  the  experiment, 
each  image  interpreter  was  presented  the  image  scenes  in  precisely  the 
same  order.  This  procedure  facilitated  the  conduct  of  the  experiment 
and  avoided  the  possibility  that  one  subject  might  obtain  information 
about  a  subsequent  scene  from  one  of  his  fellow  subjects.  Even  with 
the  scene  order  fixed,  the  number  of  orders  in  which  13  treatment 
conditions  can  be  presented  is  very  large.  Table  A-2  shows  the  order 
in  which  the  treatment  conditions  were  presented  to  each  of  the  26 
subjects  for  the  practice  image  and  the  13  test  scenes. 

The  analysis  of  variance  summary  appears  in  Table  B-3.  Main 
ef fects — subjects ,  images,  and  experimental  conditions— are  significant 
beyond  the  .01  level.  A  test  of  the  differences  among  all  possible 
pairs  of  treatment  means  was  made  using  Duncan's  multiple  range  test 
(Table  4).  The  following  generalizations  appear  warranted:  Interpreter 
ability  to  distinguish  target  and  non-target  objects  at  specified 
locations  on  an  image  is  not  significantly  reduced  when  the  three  de¬ 
grading  factors  employed  in  this  experiment  are  introduced  singly  in 
treating  the  imagery.  However,  with  one  exception,  when  these  factors 


Edwards,  A.  L.  Experimental  design  in  psychological  research. 
New  York:  Holt,  Rhinehart,  and  Winston,  1963,  236  ff. 
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Table  3 


MEAN  DETECTION  AND  IDENTIFICATION  ACCURACY  AND  COMPLETENESS 


TREATMENT 

COMBINATION 

DETECTION 

ACCURACY- 

COMPLETENESS 

IDENTIFICATION 

ACCURACY 

COMPLETENESS 

Mean 

S.D. 

! 

Mean 

S.D. 

Mean 

S.D. 

1-1-1 

R9 

1 

.564 

.254 

.635 

.256 

2-1-1 

:H89 

1 

.467 

.230 

.521 

.198 

3-1-1 

.743 

.141 

.392 

.183 

.488 

.204 

4-1-1 

.736 

.131 

.374 

.261 

.349 

.209 

1-2-1 

.752 

.140 

.422 

.266 

.439 

.274 

1-3-1 

.774 

.177 

.420 

.256 

.370 

.227 

1-1-2 

.771 

.183 

.445 

.239 

.503 

.283 

1-1-3 

.749 

.172 

.412 

.271 

.433 

.261 

1-1-4 

.764 

.152 

.402 

.233 

.448 

.244 

1-3-4 

.738 

.142 

.369 

.254 

.358 

.237 

4-1-4 

.670 

.174 

.242 

.231 

.222 

.231 

4-3-1 

.622 

.191 

.189 

.205 

.162 

.174 

4-3-4 

.632 

.154 

.196 

.192 

.150 

.179 

were  used  In  combination  to  degrade  the  imagery,  detection,  performance 
deteriorated  significantly.  The  one  exception,  treatment  condition 
(1-3-4),  appears  to  indicate  that  large  image  scale  may  offset  the 
effects  produced  by  the  other  two  degrading  factors.  Imagery  produced 
under  this  treatment  condition  was  of  the  large  scale,  about  1:2,000, 
with  maximum  haze  effect  and  greatest  blurring  due  to  image  movement. 
Mean  detection  performance  for  imagery  degraded  in  this  fashion  was  not 
significantly  poorer  than  that  for  imagery  degraded  in  only  one  dimen¬ 
sion  or  not  degraded  at  all. 

The  three  treatment  conditions  that  produced  the  greatest  loss  in 
detection  performance  were  all  at  the  smallest  scale,  about  1:12,000, 
While  no  data  were  available  for  intermediate  photo  scales  coupled  with 
degradations  produced  by  simulated  atmospheric  attentuation  and  blurring 
due  to  image  movement,  it  seems  reasonable  to  assume  that  when  image 
scale  is  small,  any  additional  loss  in  image  quality  brought  about  by 
other  degrading  factors  will  be  accompanied  by  a  significant  reduction 
in  detection  accuracy.  The  13  treatment  conditions  selected  for  the 
present  experiment  have  provided  some  evidence  concerning  the  effect  jf 
these  factors  on  detection  accuracy. 
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Statistical  significance  level  of  differences  is  indicated  by  an  underscore  where  P  ^  J01  and  without  an  underscore  where  P  ^  .05. 


I'ndoubtedly ,  the  foregoing  is  known  to  those  who  plan  operational 
aerial  surveillance  missions.  If  point  targets  are  to  be  detected,  the 
altitude  of  the  aircraft,  focal  length  of  the  lens,  time  of  day,  amount 
of  turbulence,  and  so  forth  are  considered  as  the  mission  is  planned. 
After  the  mission  is  flown,  the  suitability  of  the  imagery  acquired  can 
be  judged  prior  to  interpretation.  If  the  image  seal’  is  small  and  the 
imagery  degraded  by  factors  other  than  scale,  the  G2  Air  Officer  may 
decide  to  have  the  mission  re-flown  immediately  in  order  to  meet  mission 
requirements . 


Identification  Accuracy 


Table  B-4  lists  the  identification  accuracy  scores  of  the  26  inter¬ 
preters  for  each  of  the  13  treatment  conditions  and  Table  B-5  presents 
similar  scores  for  the  13  image  scenes.  Table  B-6  summarizes  the  analy¬ 
sis  of  variance  of  these  data.  Main  effects — subjects,  images,  and 
experimental  conditions — were  statistically  significant  at  better  than 
the  one  percent  level. 

The  differences  in  mean  performance  among  the  13  treatment  condi¬ 
tions  were  compared  using  Duncan’s  multiple  range  test  (Table  5). 

Entries  are  for  those  treatment  conditions  where  the  differences 
between  treatment  means  are  statistically  significant  af  P  £  .05  or 
better.  Mean  identification  accuracy  for  the  treatment  condition 
yielding  the  best  imagery  (code  1-1-1) — largest  scale,  without  haze, 
and  without  blur — was  significantly  greater  than  that  obtained  under 
all  other  treatment  conditions.  Any  reduction  in  quality — single  or 
multi-dimensional — significantly  decreased  identification  accuracy  of 
the  interpreter. 

Without  exception,  results  for  treatment  conditions  in  which 
only  one  dimension  was  less  than  optimal  followed  the  same  pattern. 

Mean  performance  for  these  degrading  conditions  involving  a  single 
factor  did  not  differ  significantly  from  the  mean  performance  obtained 
under  other  single-factor  degrading  conditions.  However,  the  mean 
performance  for  these  single-factor  degradations  differed  significantly 
from  the  mean  performance  obtained  when  the  smallest  scale  imagery  was 
degraded  on  one  or  two  additional  dimensions.  Finally,  for  large  scale 
imagery  such  as  that  produced  under  treatment  condition  (code  1-3-4)— 
largest  scale,  maximum  haze,  and  blur — mean  performance  was  significantly 
better  than  wlen  small  scale  imagery  was  degraded  by  haze  or  by  haze 
and  blur. 


Identification  Completeness 

Table  B-7  lists  the  identification  completeness  scores  for  the  26 
subjects  for  each  of  the  13  treatment  conditions  and  Table  B-8  gives 
similar  scores  for  these  men  for  each  of  the  13  image  scenes.  Table  B-9 
summarizes  the  analysis  of  variance  for  these  data.  Main  effects — 
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DUNCAN'S  MULTIPLE  RANGE  TEST  FOR  DIFFERENCES  AMONG 
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Significant  mean  differences  are  listed  above  the  diagonal  (Difference  *  Column  Mean  -  Row  Mean). 

Statistical  significance  level  of  differences  is  indicated  by  an  underscore  where  P  ^  .01  and  without  an  underscore  where  P  ^  .05. 
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images,  subjects,  and  experimental  conditions — were  statistically  sig¬ 
nificant  at  better  than  the  one  percent  level. 

The  differences  in  mean  performance  among  the  13  treatment  condi¬ 
tions  were  compared  using  Duncan's  multiple  range  test.  Table  6  shows 
that  identification  completeness  performance  followed  a  pattern  quite 
similar  to  that  obtained  for  identification  accuracy.  For  the  best 
quality  imagery  used  in  the  experiment  (code  1-1-1),  identification 
completeness  was  significantly  superior  to  that  obtained  from  any  of 
the  twelve  experimental  variants  of  the  best  condition-. 

For  any  of  the  single-factor  degrading  conditions,  identification 
completeness  was  better  than  that  obtained  for  imagery  of  the  smallest 
scale  subjected  to  additional  degradation  in  one  or  both  of  the  other 
two  degrading  dimensions.  The  largest  scale  imagery  degraded  maximally 
by  haze  and  blur  (code  1-3-4)  gave  results  similar  to  those  of  the 
single  degrading  conditions.  It  appears  that  when  the  largest  scale 
imagery  is  degraded  by  haze  and  blur,  the  decrement  in  identification 
completeness  is  significantly  less  than  when  the  smallest  scale  imagery 
is  degraded  by  either  haze  or  blur  or  by  both. 

The  pattern  of  significant  mean  differences  for  identification 
completeness  differs  from  that  obtained  for  identification  accuracy  in 
the  following  eight  instances:  1)  Imagery  of  scale  1:4,000  with  no 
other  degradation  yielded  better  completeness  performance  than  that 
obtained  from  imagery  of  1:12,000  scale  without  additional  degradation. 
2)  Imagery  of  1:8,000  scale  but  no  other  degradation  resulted  in  better 
identification  completeness  scores  than  that  obtained  with  imagery  of 
scale  1:12,000  but  no  other  degradation.  3)  Imagery  of  1:4,000  scale 
as  the  only  degrading  factor  gave  better  completeness  results  than  was 
obtained  from  imagery  of  1:2,000  scale  degraded  by  maximum  haze  and 
blur.  4)  Imagery  of  1:8,000  scale  as  the  only  degrading  factor  resulted 
in  better  identification  completeness  than  that  obtained  from  imagery  of 
1:2,000  scale  with  maximum  haze  and  blur.  5)  Imagery  of  1:4,000  scale 
and  no  other  degradation  results  in  more  complete  identification  than  is 
obtained  with  imagery  of  1:2,000  scale  with  maximum  haze  effect  but  no 
blur.  6)  Imagery  of  1:2,000  scale,  without  haze  but  with  the  least 
appreciable  amount  of  blur  produced  more  complete  responses  than  were 
obtained  from  1:12,000  scale  images  without  added  degradation.  7)  This 
same  type  of  large  scale  as  described  in  (6)  was  superior  to  imagery  of 
the  largest  scale  but  with  maximal  amounts  of  haze  and  blur.  8)  The 
same  large  scale  imagery  as  described  in  (6)  resulted  in  more  complete 
performance  than  that  obtained  with  imagery  of  the  largest  scale, 
maximal  haze,  but  without  blur.  The  small  loss  in  quality  resulting 
from  the  introduction  of  the  smallest  discrete  amount  of  blurring  did 
not  produce  any  marked  decrease  in  identification  completeness.  This 
result  may  have  been  due  to  the  fact  that  the  blurring  effect  was  one¬ 
dimensional  and  the  amount  of  movement  was  relatively  small.  The  one¬ 
dimensional  nature  of  the  blurring  effect  was  the  result  of  the  method 
used  to  simulate  this  dimension.  The  film  on  which  the  image  was  being 
copied  was  moved  at  controlled  rates.  Blurring  of  the  image  took  place 
along  the  line  of  this  movement. 
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All  three  experimental  dimensions  used  in  varying  image  quality  in 
this  experiment  are  seen  to  have  produced  significant  differences  In 
identification  completeness  within  the  range  employed  in  this  research. 
The  effect  of  any  combination  of  these  degrading  factors  was  more  pro¬ 
nounced  when  the  image  scale  was  very  small. 


Absolute  Levels  of  Performance 

Identification  Accuracy .  In  the  preceding  paragraphs  the  relative 
aspects  of  identification  accuracy  and  their  dependence  on  the  various 
treatment  conditions  we*-e  discussed.  One  point  of  interest  that  should 
he  discussed  is  the  absolute  level  of  identification  accuracy  attained 
in  the  experiment.  I'nder  the  best  condition  of  image  quality,  identifi¬ 
cation  accuracy  was  no  better  than  , 5 (  for  the  average  interpreter. 

This  level  of  performance  is  not  atypical  from  the  results  obtained  in 
other  surveillance  research  experiments,  However,  an  examination  of  the 
factors  operating  in  this  specific  experiment  may  help  to  explain  why 
the  absolute  level  of  identification  accuracy  was  not  larger. 

Identification  accuracy  as  an  index  of  performance  is  based  on 
the  number  of  target  responses  made  by  the  interpreter.  This  number  of 
responses  is  the  denominator  of  the  fraction  and  includes  the  number  of 
correct  identifications  plus  the  number  of  target  misidentif ications 
plus  the  number  of  non-targets  erroneously  identified  as  targets  (in¬ 
ventive  responses).  The  numerator  of  the  fraction  is  the  number  of 
correct  identifications.  The  number  of  correct  responses  is  directly 
dependent  on  the  level  of  detail  to  which  the  targets  must  he  identified. 
For  this  experiment,  the  interpreters  were  required  to  identify  the 
targets  rather  precisely,  "rucks,  for  example,  were  to  be  identified 
bv  tonnage.  A  response  of  "truck"  was  not  scored  as  correct  for  an 
imaged  object  that  was  a  2  1/2-ton  truck.  This  requirement  for  pre¬ 
cision  in  naming  target  objects  reduced  the  number  of  correct  identifi¬ 
cations  and  increased  the  number  of  misidentif ications.  The  numerator 
of  the  identification  accuracy  index  was  thereby  reduced  while  the 
denominator  was  increased.  This  is  one  of  the  factors  operating  to 
reduce  the  size  of  the  identification  accuracy  index. 

A  second  factor  associated  with  the  absolute  level  of  identification 
accuracy  relates  to  the  number  of  non-target  annotations  that  were  wrongly 
judged  to  contain  targets.  The  number  of  such  erroneous  detections  in 
this  particular  research  may  be  unduly  large  as  a  result  of  the  wav  non¬ 
target  areas  were  annotated.  Here,  the  non-target  annotations  in  the 
imagery  were  selected  deliberately  to  include  terrain  features  and  man¬ 
made  objects  of  the  types  interpreters  frequently  confuse  with  tactical 
targets.  Objects  such  as  rocks,  rectangular  outlines,  highlighted  tree 
crowns,  shadows  and  vet  spots  on  the  road  were  annotated.  Tie  nature  of 
the  non-target  annotations  used  in  this  experiment  may  have  increased  the 
likelihood  that  an  interpreter  would  name  the  nor-targets  as  target 
objects.  The  number  such  inventive  responses  increased  the  denomina¬ 
tor  of  the  Identification  accuracy  index  and  thereby  makes  the  index 
smaller.  Those  two  factors  may  have  been  responsible  for  the  absolute 
level  of  identification  accuracy  obtained  in  the  present  experiment. 
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Identification  Completeness .  The  absolute  level  of  identification 
completeness  obtained  by  the  subjects  of  this  experiment  merits  comment. 
The  index  of  identification  for  the  best  imagery  (code  1-1-1)  used  in 
the  experiment  was  about  64  percent.  Why  were  fewer  than  tvo-thirds  of 
the  tarpets  properly  identified  under  the  best  of  conditions?  Insuf¬ 
ficient  working  time  might  be  advanced  as  a  reasonable  explanation, 
however,  time  was  not  responsible,  since  the  interpreters  were  allowed 
enough  time  to  complete  each  annotation.  A  second  possible  cause  is 
the  level  of  detail  required  for  a  correct  response.  In  order  to  be 
scored  as  a  correct  identification,  the  interpreter  had  to  identify 
the  target  rather  explicitly.  This  is  the  same  argument  as  that  pre¬ 
sented  in  the  discussion  for  identification  accuracy.  This  requirement 
for  an  exact  name  for  the  target  reduces  the  number  of  correct  responses 
and  consequentlv  the  value  of  the  numerator  of  the  completeness  index. 

A  third  possible  cause  deals  with  the  nature  of  the  stimulus  material. 
Some  of  the  annotations  encircled  multiple  targets.  Tf  an  annotation 
contained  three  M-48  tanks,  the  interpreter  had  to  report  all  three  in 
order  to  obtain  full  identification  credit.  The  interpreters  had  been 
instructed  to  report  all  targets  contained  in  each  annotated  area  and 
had  been  informed  that  some  annotations  would  contain  multiple  targets. 
Thev  were  not  told  which  of  the  annotations  actually  contained  the 
multiple  tarpets.  Previous  research  has  shown  that  interpreters 
sometimes  report  one  target  of  a  cluster  but  fail  to  notice  or  fail  to 
report  the  adjacent  targets  in  the  cluster.  Therefore,  the  presence 
of  multiple  targets  may  have  lowered  the  level  of  completeness  attained 
by  the  subjects  in  this  experiment.  A  fourth  factor  that  may  have  been 
operating  concerns  the  limit  imposed  by  the  level  of  detection  complete¬ 
ness  achieved  by  each  interpreter.  For  each  annotation,  the  interpreter 
judged  whether  a  target  was  or  was  not  present.  For  those  annotations 
judged  not  to  have  a  target,  the  interpreter  made  no  identification. 
Therefore,  the  more  actual  target  annotations  the  interpreter  errone¬ 
ously  classified  as  non-target  annotations,  the  lower  his  maximal 
identification  completeness  ceiling  became.  Maximal  identification 
completeness  would  have  been  obtained  had  the  interpreter  classified 
every  annotation  as  a  target  annotation  and  then  have  reported  his  best 
estimate  of  the  identity  of  the  real  or  imagined  targets  present  in 
these  annotations.  To  indicate  how  failure  to  classify  annotations 
properly  limits  identification  completeness,  imagine  that  an  interpreter 
judged  that  only  20  of  the  104  target  annotations  contained  in  the 
imagery  were  target  annotations  and  then  correctly  identified  one  target 
in  each  of  the  20  annotations.  His  identification  completeness  score 
would  be  20/1^4  (there  were  104  targets  present  in  the  104  target  anno¬ 
tations)  or  about  10  percent.  The  extent  to  w’hich  proper  classif ication 
of  annotations  limited  identification  completeness  might  be  sought  by 
referring  to  the  values  listed  in  Table  3.  However,  these  figures  refer 
to  detection  performance  for  target  annotations  and  non-target  annotations 
combined.  The  interpreter's  performance  in  correctly  classifying  non¬ 
target  areas  and  target  areas  were  summed  and  expressed  as  a  ratio  of 
200— the  total  number  of  target  plus  non-target  annotations. 

Detection  Performance.  Table  7  gives  detection  completeness  for  the 
target  and  non-target  annotations  separately  and  repeats  the  data  from 
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Table  7 


MEAN  DETECTION  ACCURACY  (COMPLETENESS)  BY  ANNOTATION  TYPE 


Treatment 

Condition 

Target 

Annotations 

Non-Target 

Annotations 

All 

Annotations 

Mean 

s.n. 

Mean 

S.D. 

Mean 

S.D. 

l-l-l 

.935 

.110 

.666 

.281 

.796 

.152 

2-1-1 

.930 

.110 

.633 

.260 

.770 

.161 

3-1-1 

.938 

.079 

.500 

.287 

.743 

.141 

4-1-1 

.824 

.261 

.577 

.272 

.736 

.131 

1-2-1 

.916 

.119 

.569 

.274 

.752 

.140 

1-3-1 

.912 

.122 

.667  1 

.218 

.774 

.177 

1-1-2 

.931 

.102 

.601 

.270 

.771 

.183 

1-1-3 

.915 

.123 

.631 

.244 

.749 

.172 

1-1-4 

.924 

.134 

.616 

.287 

.764 

.152 

1-3-4 

.859 

.209 

.582 

.268 

.738 

.142 

4-1-4 

.658 

.373 

.556 

.304 

.670 

.174 

4-3-1 

.691 

.304 

.452 

.299 

.622 

.191 

4-3-4 

.699 

.268 

.499 

.315 

.632 

.154 

MEANS 

.856 

.224 

.581 

.284 

.732 

.169 

Table  3  for  the  comhined  results.  One  striking  feature  can  be  seen  in 
the  performance  for  non-target  annotations.  As  image  quality  was 
degraded,  the  correct  classification  of  both  target  and  non-target 
annotations  declined.  However,  for  non-target  annotations,  detection 
accuracy  (completeness)  dropped  to  chance  performance  or  below.  If 
the  interpreter  were  to  toss  a  coin  for  each  annotation — heads,  it's 
a  target;  tails,  it's  a  non-target — one  would  expect  that  for  a  large 
number  of  such  annotations  he  would  classify  about  50  percent  correctly. 
Even  for  the  best  quality  imagery,  the  mean  performance  for  correct 
classification  of  non-target  annotations  was  only  67  percent;  that  is, 
one-third  of  the  annotations  were  classified  as  target  annotations 
when  in  fact  no  targets  were  present. 

The  mean  detection  accuracy  or  completeness  for  target  annotations 
was  about  86  percent  with  a  high  of  94  percent  for  the  best  quality 
imagery  and  a  low  of  66  percent  for  one  of  the  poorer  quality  image 
variants.  It  appears  that  as  image  quality  is  degraded  the  average 
interpreter  is  less  able  to  detect  target  cues  and  signatures  and,  as 
a  consequence,  classifies  more  of  the  target  annotations  as  being  non¬ 
target  annotations.  However,  under  no  circumstance  in  this  experiment 
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did  detection  performance  for  target  annotations  deteriorate  to  chance 
level  (50  percent)  in  this  experiment.  The  analysis  of  variance 
summary  for  detection  accuracy  performance  for  target  annotations 
appears  in  Table  B-10  and  that  for  non-target  annotations  in  Table 
B-ll.  Main  effects — subjects,  images,  and  experimental  conditions — 
are  all  significant  beyond  the  .01  level.  Differences  among  treatment 
condition  means  were  tested  using  Duncan's  multiple  range  technique  and 
are  reported  in  Table  B-12  for  target  annotations  and  in  Table  B-13  for 
non-target  annotations.  The  pattern  of  significant  differences  for 
target  annotations  is  quite  similar  to  that  obtained  for  all  annotations 
which  appeared  in  Table  4.  Results  for  non-target  annotations  seem  to 
show  that  the  interpreters  found  the  classification  task  to  be  much  more 
difficult  for  non-target  objects.  Detection  performance  dropped  to 
chance  level  when  only  a  moderate  degree  of  photo  degradation  was  intro¬ 
duced. 

From  the  foregoing  discussion,  it  seems  that  the  level  of  detection 
accuracy  (completeness)  was  not  one  of  the  limiting  factors  responsible 
for  the  modest  level  of  target  identification  completeness  obtained  in 
the  experiment.  For  example,  the  best  quality  imagery  (code  1-1-1)  the 
mean  detection  completeness  for  target  annotations  was  94  percent. 
Therefore,  about  94  percent  of  the  targets  in  the  imagery  were  available 
to  the  interpreters  for  identification,  but  observed  identification  com¬ 
pleteness  was  only  64  percent.  For  this  reason,  it  seems  that  the  ex¬ 
planation  for  the  absolute  level  of  identification  completeness  obtained 
must  be  attributed  to  the  presence  of  multiple  targets  in  the  annotations 
and  the  difficulty  of  the  identification  task— the  level  of  detail  re¬ 
quired  in  order  to  receive  credit  for  a  correct  target  identification. 

The  following  observations  sum  up  this  discussion  of  absolute 
levels  of  performance: 

The  absolute  level  of  detection  accuracy  was  to  a  large  extent 
determined  by  the  difficulty  of  the  non-target  annotations.  For  about 
one-fourth  of  the  experimental  conditions,  detection  performance  for 
non-target  annotations  was  at  chance  level  (50  percent). 

The  absolute  level  of  identification  accuracy  appeared  to  be 
dependent  on  the  level  of  detail  required  for  a  correct  identification 
response  and  by  the  number  of  inventive  errors  made  by  the  interpreters. 

The  absolute  level  of  identification  completeness  seemed  to  be 
governed  by  the  presence  of  multiple  targets  in  the  target  annotations 
and  by  the  level  of  detail  demanded  of  the  interpreter  in  order  to 
obtain  credit  for  a  correct  response. 

The  levels  of  performance  obtained  in  this  experiment  do  not  apply 
directly  to  the  operational  situation.  The  imagery  for  the  experiment 
was  annotated,  and  subjects  were  paced  through  the  imagery  annotation 
by  annotation — a  "directed  search"  condition.  The  absolute  performance 
levels  for  detection  and  identification  may  be  very  different  from  those 
which  might  have  been  oi  lained  in  a  "free  search"  situation  in  which 
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there  are  no  annotations  and  the  interpreter  must  search  for  targets 
and  identify  the  objects. 


CONCLUSIONS 

With  respect  to  the  effects  of  the  degradation  sources  included  in 
the  exoeriment,  the  following  conclusions  appear  warranted: 

Any  degradation  of  photo  quality,  unidimensional  or  mu  It  id  intensions], 
significantly  reduces  the  accuracy  and  completeness  of  target  identifica¬ 
tion. 

I'nidimensional  degradation  of  photo  quality  does  not  significantly 
reduce  the  level  of  detection  performance  (accuracy/completeness), 
whereas  multidimensional  degradation  is  associated  with  significant 
deterioration  of  detection  performance. 

In  general,  detection  and  identification  performance  for  imagery 
degraded  on  only  a  single  dimension  is  significantly  superior  to  that 
for  imagery  degraded  on  more  than  one  dimension. 

The  effect  of  degradation  of  photo  quality  by  haze  and  blur,  or 
both,  is  more  pronounced  for  small  scale  imagery  than  for  large  scale 
imagery. 

Performance  as  measured  by  all  three  dependent  measures  differed 
significantly  between  interpreters.  Within  interpreters,  performance 
differed  significantly  by  scene  content  (complexity)  and  by  kind  and 
amount  of  photo  quality  degradation. 

The  research  was  conducted  for  the  purpose  of  identifying  additional 
dimensions  of  photo  quality  that  should  be  included  in  the  development 
of  the  next  generation  of  the  BF.SRL  photo  quality  catalog.  The  follow¬ 
ing  results  appear  applicable  to  that  goal:  Haze  effect  produced 
operationally  by  atmospheric  attenuation  should  be  represented.  While 
this  source  of  degradation  does  not  have  an  enormous  effect  on  inter¬ 
pretation  performance  in  isolation,  it  does  result  in  significant 
deterioration  of  performance  when  coupled  with  ether  sources  of  degra¬ 
dation.  As  a  dimension  of  photo  quality,  the  effect  of  haze  should  he 
defined  and  quantified  and  its  effect  in  interaction  with  scale,  sharp¬ 
ness,  and  scene  complexity  determined  so  that  it:  cun  be  adequately 
covered  in  the  catalog  imagery. 
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Appendix  continued 
Table  B-12. 


B-13. 


Page 

Duncan's  multiple  range  test  for  differences 
among  treatment  means  for  detection  accuracy 
performance  for  target  annotations  only  3G 

Duncan's  multiple  range  test  for  differences 
among  treatment  means  for  detection  accuracy 
performance  for  non-target  annotations  only  37 
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Table  A-l 


SCALE  OF  IMAGERY  PRODUCED  FOR  TESTS 


Imape 

Number 

la 

Scale 

2 

for  Various 

3 

Experimental  Conditions 

4  5-10  11-13 

1 

1:2100 

1:3810 

1:8200 

1:12,600 

1:2100 

1:12,600 

2 

1:2100 

1:3810 

1:8200 

1:12,600 

1:2100 

1:12,600 

3 

1:2000 

1:3650 

1:8000 

1:12,000 

1:2000 

1:12,000 

4 

1:2400 

1:4360 

1:9600 

1:14,500 

1:2400 

1:14,500 

5 

1:2400 

1:  '360 

1:9600 

1:14,500 

1:2400 

1:14,500 

6 

1:2400 

1:4360 

1:9600 

1:14,500 

1:2400 

1:14,500 

7 

1:2000 

1:3650 

1:8000 

1:12,000 

1 : 2000 

1:12,000 

8 

1:3300 

1:6000 

1:13,250 

1:20,000 

1:3300 

1:20,000 

9 

1:3200 

1:5800 

1:12,800 

1:19,400 

1:3200 

1:19,400 

10 

1:3200 

1:5800 

1:12,800 

1:19,400 

1:3200 

1:19,400 

11 

1:2000 

1:3650 

1:8000 

1:12,000 

1:2000 

1:12,000 

12 

1:2000 

1:3650 

1:8000 

1:12,000 

1:2000 

1:12,000 

13 

1:2000 

1:3650 

1:8000 

1:12,000 

1:2000 

1:12,000 

aOriginal  negative  scale. 
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Table  A-2 


EXPERIMENTAL  CONDITIONS  FOR  IMAGE  X  EXAMINEE  COMBINATIONS3 


Ind. 

Code 

No. 

1 

2 

3 

4 

5 

6 

Image  Number 

7  8  9 

10 

11 

12 

13 

14 

1 

3 

1 

12 

4 

9 

6 

2 

7 

10 

3 

5 

13 

8 

il 

2 

3 

2 

13 

5 

10 

7 

3 

8 

11 

4 

6 

1 

9 

12 

3 

3 

3 

1 

6 

11 

8 

4 

9 

12 

5 

7 

2 

10 

13 

4 

3 

4 

2 

7 

12 

9 

5 

10 

13 

6 

8 

3 

11 

1 

5 

3 

5 

3 

8 

13 

10- 

6 

11 

1 

7 

9 

4 

12 

2 

6 

3 

6 

4 

9 

1 

11 

7 

72 

2 

8 

10 

5 

13 

3 

7 

0 

mJ 

7 

5 

10 

2 

12 

8 

13 

3 

9 

11 

6 

1 

4 

8 

3 

8 

6 

11 

3 

13 

9 

1 

4 

10 

12 

7 

2 

5 

9 

3 

9 

7 

12 

4 

1 

10 

2 

5 

11 

13 

8 

3 

6 

10 

3 

10 

8 

13 

5 

2 

11 

3 

6 

12 

1 

9 

4 

7 

11 

3 

11 

9 

1 

6 

3 

12 

4 

7 

13 

2 

10 

5 

8 

12 

3 

12 

10 

2 

7 

4 

13 

5 

8 

1 

3 

11 

6 

9 

13 

3 

13 

11 

3 

8 

5 

1 

6 

9 

2 

4 

12 

7 

10 

14 

3 

13 

1 

7 

8 

9 

3 

5 

6 

10 

2 

4 

11 

12 

15 

3 

11 

12 

13 

1 

7 

10 

8 

9 

5 

6 

2 

3 

4 

16 

3 

3 

4 

11 

12 

13 

5 

1 

/ 

8 

9 

6 

10 

2 

17 

3 

10 

2 

3 

4 

11 

8 

12 

13 

1 

7 

9 

5 

6 

18 

3 

5 

6 

10 

2 

3 

1 

4 

11 

12 

13 

7 

8 

9 

19 

3 

8 

9 

5 

6 

10 

12 

2 

3 

4 

11 

13 

1 

7 

20 

3 

1 

7 

8 

9 

5 

4 

6 

10 

2 

3 

11 

12 

13 

21 

3 

12 

13 

1 

7 

8 

2 

9 

5 

6 

10 

3 

4 

11 

22 

3 

4 

11 

12 

13 

1 

6 

7 

8 

9 

5 

10 

2 

3 

23 

3 

2 

3 

4 

11 

12 

9 

13 

1 

7 

8 

5 

6 

10 

24 

3 

6 

10 

2 

3 

4 

7 

11 

12 

13 

1 

8 

9 

5 

25 
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Experimental  conditions  are  keyed  as  shown  in  Table  1. 
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Table  B-3 


ANALYSIS  OF  VARIANCE  SUMMARY:  DETECTION  ACCURACY  SCORES 


Source  of 
Variation 

Sum  of 
Squares 

df 

Mean 

Square 

F 

F.95 

F.  99 

Between: 

Subjects 

1.71793 

25 

.06872 

4.16** 

1.55 

1.84 

Hi thin: 

Images 

2.21050 

12 

.18421 

11.15** 

1.79 

2.25 

Conditions 

.95004 

12 

.07917 

4.79** 

1.79 

2.25 

Residual 

4.75798 

288 

.01652 

— 

— 

— 

TOTAL 

9.63645 

337 

— 

— 

— 

**Means  significantly  different,  P  s  .01. 
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IDENTIFICATION  ACCURACY  SCORES:  SUBJECT  BY  IMAGE 
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Table  B-6 


ANALYSIS  OF  VARIANCE  SUMMARY:  IDENTIFICATION  ACCURACY  SCORES 


Source  of 
Variation 

Sum  of 
Squares 

df 

Mean 

Square 

F 

F.95 

F.  99 

Eetween: 

Subjects 

2.47175 

25 

.09887 

3.20** 

1.55 

1.84 

Vi t hir : 

Images 

7.79496 

12 

.64958 

21.00** 

1.79 

2.25 

Conditions 

3.64678 

12 

.30390 

9.83** 

1.79 

2.25 

Residual 

8.90782 

288 

.03093 

— 

— 

— 

TOTAL 

22.82131 

337 

— 

— 

— 

**Means  significantly  different,  P  s .01. 
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IDENTIFICATION  COMPLETENESS  SCORES:  SUBJECT  BY  TREATMENT  CONDITION 
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IDENTIFICATION  COMPLETENESS  SCORES:  SUBJECT  BY  IMAGE 
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Table  B-9 


ANALYSIS  OF  VARIANCE  SUMMARY:  IDENTIFICATION  COMPLETENESS  SCORES 


Source  of 
Variation 

Sum  of 
Squares 

df 

Mean 

Square 

F 

F.  95 

F.99 

Between: 

Subjects 

2.35077 

25 

.09403 

2.16** 

1.55 

1.84 

Within: 

Images 

3.21364 

12 

.26780 

6.15** 

1.79 

2.25 

Conditions 

6.44704 

12 

.53725 

12.34** 

1.79 

2.25 

Residual 

12.53695 

288 

.04353 

— 

— 

— 

TOTAL 

24.54840 

337 

— 

— 

**Means  significantly  different,  P  £  .01. 


-  33  - 


Table  B-10 


ANALYSIS  OF  VARIANCE  SUMMARY: 

DETECTION  ACCURACY  SCORES  FOR  TAROET  ANNOTATIONS 


Source  of 
Variation 

Sum  of 
Squares 

df 

Mean 

Square 

F 

F.  95 

F.  99 

Between: 

Subjects 

2.517965 

25 

.100719 

2.92** 

1.55 

1.84 

Within: 

Images 

1.044532 

12 

.087044 

2.52** 

1.79 

2.25 

Conditions 

3.406700 

12 

.283892 

8.24** 

1.79 

2.25 

Residual 

9.926800 

288 

.034468 

— 

— 

TOTAL 

16,895997 

337 

— 

— 

— 

**Means  significantly  different,  P  s  .01. 
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Table  B-ll 


ANALYSIS  OF  VARIANCE  SUMMARY: 

DETECTION  ACCURACY  SCORES  FOR  NON-TARGET  ANNOTATIONS 


Source  of 
Variation 

Sum  of 
Squares 

df 

Mean 

Square 

F 

F.  95 

F 

r .  99 

Between: 

Subjects 

10.013154 

25 

.400526 

9.86** 

1.55 

1.84 

Within: 

Images 

4.103743 

12 

.341979 

8.42** 

1.79 

2.25 

Conditions 

1.361924 

12 

.113494 

2.79** 

1.79 

2.25 

Residual 

11.698229 

288 

.040619 

— 

— 

— 

TOTAL 

27.177050 

337 

— 

— — 

— 

**Means  significantly  different,  P  s  .01. 
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